Structured queries, language modeling, and relevance modeling in cross-language information retrieval
نویسندگان
چکیده
Two probabilistic approaches to cross-lingual retrieval are in wide use today, those based on probabilistic models of relevance, as exemplified by INQUERY, and those based on language modeling. INQUERY, as a query net model, allows the easy incorporation of query operators, including a synonym operator, which has proven to be extremely useful in cross-language information retrieval (CLIR), in an approach often called structured query translation. In contrast, language models incorporate translation probabilities into a unified framework. We compare the two approaches on Arabic and Spanish data sets, using two kinds of bilingual dictionaries – one derived from a conventional dictionary, and one derived from a parallel corpus. We find that structured query processing gives slightly better results when queries are not expanded. On the other hand, when queries are expanded, language modeling gives better results, but only when using a probabilistic dictionary derived from a parallel corpus. We pursue two additional issues inherent in the comparison of structured query processing with language modeling. The first concerns query expansion, and the second is the role of translation probabilities. We compare conventional expansion techniques (pseudo-relevance feedback) with relevance modeling, a new IR approach which fits into the formal framework of language modeling. We find that relevance modeling and pseudo-relevance feedback achieve comparable levels of retrieval and that good translation probabilities confer a small but significant advantage.
منابع مشابه
A database approach to information retrieval: The remarkable relationship between language models and region models
In this report, we unify two quite distinct approaches to information retrieval: region models and language models. Region models were developed for structured document retrieval. They provide a well-defined behaviour as well as a simple query language that allows application developers to rapidly develop applications. Language models are particularly useful to reason about the ranking of searc...
متن کاملHighly Relevant Documents Lost in CLIR: Experiments with Dictionary Translation and Pseudo-Relevance Feedback
Research on cross-language information retrieval (CLIR) has typically been restricted to settings using binary relevance assessments. In this paper, we present evaluation results for dictionary-based CLIR using graded relevance assessments in a best match retrieval environment. A text database containing newspaper articles and a related set of 35 search topics were used in the tests. First, mon...
متن کاملEstimating structural relevance of XML elements through language model
Language modeling approaches have been extensively used as an effective way of measuring ad-hoc document content relevance. However, in structured information retrieval (SIR) there is to our knowledge no approach which aims at assessing structural relevance using language models. In this paper we present a language model based on document-query structure likelihood. As the effectiveness of lang...
متن کاملExperiments in Cross Language Query Focused Multi-Document Summarization
The twin challenges of massive information overload via the web and ubiquitous computers present us with an unavoidable task: developing techniques to handle multilingual information robustly and efficiently, with as high quality performance as possible. Previous research activities on multilingual information access systems have studied cross-language information retrieval (CLIR), information ...
متن کاملAmharic-English Information Retrieval with Pseudo Relevance Feedback
We describe cross language retrieval experiments using Amharic queries and English language document collection from our participation in the bilingual ad hoc track at the CLEF 2007. Two monolingual and eight bilingual runs were submitted. The bilingual experiments designed varied in terms of usage of long and short queries, presence of pseudo relevance feedback (PRF), and three approaches (max...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Inf. Process. Manage.
دوره 41 شماره
صفحات -
تاریخ انتشار 2005